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[TITLE OF DOCUMENT] SPECIFICATION 

[TITLE OF THE INVENTION] cDNA Fragment of Causative Gene of 

Spinocerebellar Ataxia Type 2 

[CLAIMS] 

5 [Claim 1] A DNA fragment comprising a DNA region encoding an 

amino acid sequence shown in SEQ ID NO: 1 (provided that the number of repeat 
units of Gin from the 166th to 188th amino acid varies between 15 and 100). 

[Claim 2] The DNA fragment according to claim 1, wherein said DNA 
region is the region from 49nt to 3987nt (provided that the number of repeat units of 
1 0 CAG or CAA in the region from the 543nt to 612nt varies between 1 5 and 1 00, and 
that the CAA in this region may be CAG). 
[DETAILED DESCRIPTION OF THE INVENTION] 
[0001] 

[Technical Field of the Invention] 
15 The present invention relates to cDNA fragments of the causative gene of 

spinocerebellar ataxia type 2 (hereinafter also referred to as "SCA2"). 
[0002] 
[Prior Art] 

SCA2 is an autosomal dominant, neurodegenerative disorder that affects the 
2 0 cerebellum and other areas of the central nervous system. 
[0003] 

It has recently been discovered that the causative genes of 
5 neurodegenerative diseases including dentatorubral-pallidoluysian atrophy 
(DRPLA) have more CAG repeats than the normal genes. That is, the numbers of 
2 5 CAG repeats in the causative genes of the neurodegenerative diseases are 37 to 100, 
while those in the normal genes are less than 35. 

[0004] 
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It has been suggested that the causative gene of SCA2 has an increased 
number of CAG repeats (Trottier, Y. et al. Nature, 378, 403-406 (1995)). However, 
since the causative gene of SCA2 has not been identified, and since its nucleotide 
sequence has not been determined, SCA2 cannot be diagnosed by a genetic assay. 
5 [0005] 

[Problems Which the Invention Tries to Solve] 

An object of the present invention is to provide a sequence-determined cDNA 
fragment of the causative gene of SCA2. 

[0006] 

1 0 [Means for Solving the Problems] 

The present inventors intensively studied to discover a Tsp El fragment with 
a size of 2.5 kb in which the number of CAG triplet is increased only in SCA2 
patients, and partial sequence thereof was determined. Human cDNA library was 
screened using as probes the oligonucleotides that respectively hybridize with the 

1 5 regions between which the CAG triplet repeats are interposed, and a cDNA fragment 
which hybridizes with both of these two probes was cloned. Using this cDNA 
fragment as a probe, human cDNA library was screened and a plurality of cDNA 
fragments which hybridize with the probe were cloned. Sequencing the cDNA 
fragments revealed that these cDNA fragments overlap with each other. To 

2 0 sequence the 5 '-end and 3' -end regions, RACE (rapid amplification of cDNA ends) 
was performed. Further, to sequence the 5 '-end region, RT-PCR was performed, 
thereby succeeding in sequencing the full length of the cDNA of the causative gene 
ofSCA2. 

[0007] 

2 5 That is, the present invention provides a DNA fragment comprising a DNA 

region encoding an amino acid sequence shown in SEQ ID NO: 1 (provided that the 
number of repeat units of Gin from the 166th to 188th amino acid varies between 15 
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and 100). 

[0008] 

[Modes for Carrying Out the Invention] 

As described above, the DNA fragment according to the present invention 
5 comprises the DNA region encoding the amino acid sequence shown in SEQ ID NO: 
1 in the SEQUENCE LISTING, provided that the number of repeating units of Gin 
from the 166th to the 188th amino acid varies between 15 and 100. The number of 
this repeat units is 15 to 25 in normal individuals and 35 to 100 in SCA2 patients. 
As is well-known, due to degeneration, there are a plurality of codons encoding one 
1 0 amino acid, and any DNAs which encode the amino acid sequence shown in SEQ ID 
NO: 1 are included within the scope of the present invention. The nucleotide 
sequence actually determined in the Examples below is shown in SEQ ID NO:l and 
in Figs. 1-4. The way how the nucleotide sequence was determined and the fact 
that the cDNA having this nucleotide sequence is the cDNA of the causative gene of 
1 5 SCA2 are detailed in the Examples below. 
[0009] 

Since the nucleotide sequence of the nucleic acid fragment according to the 
present invention was determined by the present invention, the nucleic acid fragment 
may be cloned by utilizing amplification by PCR using human cDNA library as a 
2 0 template. In cases where it is difficult to amplify the nucleic acid fragment by a 

single PCR, the nucleic acid fragment may be divided into a plurality of regions and 
the PCR products may be ligated by a conventional method so as to clone the nucleic 
acid fragment. 
[0010] 
2 5 [Example] 

The present invention will now be described more concretely by way of 
examples thereof. It should be noted that the present invention is not restricted to 
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the following examples. 
[0011] 

Example 1 Preparation of (CAG) 55 Probe 

A genomic DNA segment of DRPLA gene containing a CAG repeat with 55 
5 repeat units was amplified from the genomic DNA of a patient with DRPLA 

(Koide, R. et al., Nature Genet., 6, 9-13 (1994)) and was subcloned into a plasmid 
vector, pT7Blue T(p-2093). The p-2093 plasmid contains the (CAG) 55 and the 
flanking sequences. That is, the plasmid contains the sequence of 5'-CAC CAC 
CAG C AA CAG CAA (CAG) 55 CAT CAC GGA AAC TCT GGG CC-3 ' . Using a 
1 0 pair of oligonucleotides 5'-CAC CAC CAG CAA CAG CAA CA-3'and 5'-biotin- 

GGC CCA GAG TTT CCG TGA TG-3', PCR was performed in a total volume of 16 
ul containing 10 mM Tris-HCl, pH8.3, 50 mM KC1, 1.5 mM MgCl 2 , 2M N,N,N- 
trimethylglycine, 0. 1 mM TTP, 0. 1 Mm dCTP, 0. 1 mM dGTP, 9.25 MBq of [a- 
32 P]dATP (222 TBq/mmol), 0.5 uM each of the two primers, 0.3 ng of plasmid DNA 
1 5 (p-2093) and 2.0 U of Taq DNA polymerase (Takara Shuzo, Kyoto, Japan). After 
an initial 2-min. denaturation at 94°C, PCR was performed for 3 0 cycles consisting 
of 1-min. denaturation at 94°C, 1-min. annealing at 54°C and 3-min. extension at 
72°C, followed by a final extension at 72°C for 1 0 min. 
[0012] 

20 A single-stranded (CAG) 55 probe was isolated using streptavidin-coated 

magnetic beads (Dynabeads M-280, Streptavidin;Dynal AS, Oslo, Norway) on which 
20 ul of streptavidin is coated. That is, after washing of the PCR products 
immobilized on the magnetic beads with 40 ul of a solution containing 5 mM Tris- 
HCl (pH 7.5), 0.5 mM EDTA and 1 M NaCl, the non-biotinylated strand containing 

2 5 the radio-label was separated from the biotinylated strand by incubation in 50 ul of 
0. 1 M NaOH for 10 min. The resultant supernatant was directly added to the 
hybridization solution described below. 
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[0013] 

Incidentally, using the single-stranded (CAG) 55 probe prepared as described 
above, Southern blot analysis was carried out on the androgen receptor genes 
containing 9, 22, 43 and 5 1 CAG repeat units, respectively. As a result, the 
5 (CAG) 55 probe strongly hybridized with the genes having 43 and 5 1 CAG repeats 

units, respectively, but scarcely hybridized with the gene having 22 CAG repeat units, 
and did not hybridize at all with the gene having 9 CAG repeat units (K. Sanpei et ah, 
Biochemical and Biophysical Research Communications, VoL212, No.2, 1995, 
pp.34 1-346). Thus, by using this probe, hybridization may be selectively attained 
1 0 only with DNAs containing a number of (e.g., 35 or more) CAG repeat units if the 
hybridization conditions are appropriately selected, 

[0014] 

(2) Determination of Nucleotide Sequence of SCA2 Gene 

Fig. 5 shows a pedigree chart of SCA2 patients. In this pedigree chart, 
1 5 males are represented by squares and females are represented by circles. SCA2 

patients are represented by black squares or circles, and unaffected persons are 

represented by white squares or circles. 
[0015] 

High-molecular- weight genomic DNA (15 jxg) was digested with 100 U of 
2 0 TspEl (Toyobo, Osaka, Japan), electrophoresed through 0.8% agarose gels and 

transferred to nitrocellulose membranes. The membranes were hybridized with the 
(CAG) 55 probe described above. Hybridization was performed in a solution 
containing 2.75 x SSPE (1 x SSPE=150 mM NaCl, 10 mM NaH 2 P0 4 , 1 mM EDTA), 
50% formamide, 5 x Denhardt's solution, 100 ng/ml of sheared salmon sperm DNA 
25 and the (C AG) 55 probe (6 x 1 0 6 cpm/ml) at 62°C for 1 8 hours. After the 

hybridization, the membranes were washed with 1 x SSC (150 mM NaCl, 15 mM 
sodium citrate) containing 0.5% SDS at 65°C for 0.5 hours. The membranes were 
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autoradiographed for 16 hours to Kodak Bio Max MS film at -70°C using an MS 
intensifying screen. 
[0016] 

As a result, 2.5 kbp TspEl fragment hybridized with the probe was detected 
5 only in all of the SCA2 patients. 
[0017] 

Genomic DNA (270 [ig) from an SCA2 patient (individual 7 in Fig. 5) was 
digested by TspEl and subjected to agarose gel electrophoresis. Genomic DNA 
fragments including the 2.5 kb TspEl fragment were cloned into an ifcoRI-cleaved 
1 0 XZAPII vector. The genomic library was screened using the (CAG) 55 probe under 
the hybridization condition described above. A genomic clone, Tsp-l, containing 
an expanded CAG repeat was isolated. 

[0018] 

After removal of the probe, the above-described genomic library was 
1 5 screened again using the Tsp-1 as a probe, which was labeled by the random priming. 
Hybridization was carried out in a solution containing 5 x SSC, 1 x Denhardt's 
solution, 10% dextran sulfate, 20 mM sodium phosphate, 400 (ig/ml human placental 
DNA and the Tsp-1 probe at 42°C for 1 8 hours. After the hybridization, the 
membranes were washed finally in 0.1 x SSC - 0.1% SDS at 52°C for 0.5 hours. 
2 0 The membranes were autoradiographed for 24 hours to Kodak Bio Max MS films at 
-70°C using an MS intensifying screen. As a result, a genomic clone, Tsp-2 9 
originated from a normal allele was isolated. 
[0019] 

The Smal-Apal fragment (630 bp) of Tsp2 was sequenced and 
2 5 oligonucleotides F-l (5'-CCC TCA CCA TGT CGC TGA AGC-3') and R-l (5'-CGA 
CGC TAG AAG GCC GCT G-3') were designed such that the CAG repeat units are 
sandwiched between the oligonucleotides (see Fig. 1). Using oligonucleotides F-l 
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and R-l as probes, human procephalic cortex cDNA library (STRATAGENE) was 
screened. Hybridization was performed in a solution containing 6 x SSC, 10 x 
Denhardt's solution, 0.5% SDS, 0.05% sodium pyrophosphate, 100 ng/ml of sheared 
salmon sperm DNA and end-labeled oligonucleotide probes at 55°C for 18 hours. 
5 After the hybridization, the membrane was finally washed with 6 x SSC containing 
0.5% SDS and 0.05% sodium pyrophosphate at 55°C for 0.5 hours. A cDNA clone 
Fcl with a size of 4.0 kb which hybridized with the both probes was obtained. The 
nucleotide sequences of Fcl, Tspl and Tsp2 were determined and compared. As a 
result, the nucleotide sequences in the vicinities of the CAG repeat units were 

1 0 identical except for the number of the CAG repeat units. Restriction maps of Tspl 
and Tsp2, as well as the sizes and positions of Fcl and other fragments hereinbelow 
described, are shown in Fig. 6, Using Fcl or a fragment isolated by the screening 
later described as a probe, human cDNA libraries (human procephalic cortex, human 
fetal brain, human brain and brain stem) were screened to isolate cDNA clones Fc2, 

15 Fbl4, B4 ? C6 and C19 (see Fig. 6). To identify the 5'-end of Fcl, S'-RACE 
(Frohman, M.A. et al, Proc. Natl. Acad. Sci. USA 85, 8998-9002 (1988)) was 
performed using S'-RACE-Ready cDNA (Clonetech, Palo Alto, C A, USA). Primer 
R-l was used for the first PCR, and Primer R-2 (5'-CTT GCG GAC ATT GGC AGC 
C-3', see Fig. 1) was used for the second PCR. In both PCRs, F-l (see Fig. 1) was 

2 0 used as the forward primer. The 5 ' -RACE product (5R1 ) having the size of 3 50 bp 
was subcloned into pT7Blue T vector (pT7Blue T-vector (5R1)). The identification 
of 5R1 was confirmed by the overlapping with the nucleotide sequences of Fcl, Tspl 
and Tsp2. To identify the 3'-end of the cDNA, 3'-RACE was performed using 1 jig 
of poly(A) + mRNA extracted from human brain as a template and Primer F-l 3 (5- 

2 5 TTC TCT CAG CCA AAG CCT TCT ACT ACC-3', see Fig. 3) as a primer. The 

obtained 3'-RACE product (3R1) with a size of 1300 bp was subcloned into pT7Blue 
T vector (pT7Blue T-vector (3R1)). 
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[0020] 

To investigate the S'-end region of the cDNA, reverse transcription PCR (RT- 
PCR) was performed. That is, total RNAs extracted from an autopsy from human 
brain were digested by RNase-free DNase (PROMEGA) (Onodera, O. et al., Am. J. 
5 Hum. Geent 57, 1050-1060(1995)). As the primers for the PCR, F1006 (5'-TAT 
CCG CAG CTC CGC TCC C-3', see Fig. 1) and R1002 (5'-AGC CGG GCC GAA 
ACG CGC CG-3') were used. PCR was performed in a solution with a total volume 
of 20 |iM, which contained 5 pmol each of the each primer, 10 mM Tris HC1 (pH8.3), 
50 mM KC1, 1.5 mM MgCl 2? 1.7M N,N,N-trimethylglycine, 200 |uM each of dATP, 

10 dCTP and TTP, 100 ^iM of dGTP, 100 [iM of 7-deaza dGTP and 2.5 U of Taq 

polymerase (TAKARA SHUZO). After carrying out the initial denaturation at 96°C 
for 2 minutes, a cycle of a denaturation step at 96°C for 1 minute, an annealing step 
at 65°C for 1 minute and an extension step at 72°C for 1 minute were repeated 30 
times, and a final extension step at 72°C for 5 minutes was performed, thereby 

1 5 carrying out the PCR. As a result, a clone 5R1 which extends upstream of 5R1 by 
246 bp was obtained (see Fig. 6). 
[0021] 

In Fig. 6, the hollow regions in the Tspl and Tsp2 fragments indicate the 
regions which exist in SCA2 cDNAs. The hollow regions in the SCA2 cDNA 

2 0 shows coding regions. The CAG repeating regions are shown as solid boxes. 

Restriction sites TspEl (T), NotI (N), Sac II (S), Sau3AI (Sa), Sma I (Sm), Eco52I 
(E52), Apa I (Ap), AccI (Ac), BamHI (B), Xhol (X), EcoRI (E) and Pst I (P) are 
shown. The size and position of each cDNA clone are shown below the consensus 
SCA2 cDNA. 

2 5 [0022] 

In this example, nucleotide sequences of double-stranded DNAs were 
determined by the dideoxynucleotide chain termination method (Sanger, F. et al. 




Proc. Natl. Acad. Sci. USA 74, 5463-5467(1977); Chen E.Y. et al, DNA 4, 165-170 
(1985)) using a double-stranded plasmid DNA as a template. To determine the 
nucleotide sequences of the CAG repeating regions and their flanking regions, 
genomic fragments containing the CAG repeating regions were amplified by PCR 
5 using biotinylated F-l and RS-1 (5'-CCT CGG TGT CGC GGC GAC TTC C-3% 

PCR was performed in a solution with a total volume of 25 jal, which contained 0,25 
[iM each of the each primer, 10 mM Tris HC1 (pH8.3), 50 mM KC1, 2.0 mM 
MgCl 2 , 1.7M N,N,N-trimethylglycine, 200 ^iM each of dNTP, 200 ng of the genomic 
DNA and 1 .25 U of Taq polymerase (TAKARA SHUZO). After carrying out 

1 0 initial denaturation at 95°C for 1 minute, a cycle of a denaturation step at 95°C for 2 
minutes, an annealing step at 62°C for 1 minute and an extension step at 72°C for 1 
minute was repeated 32 times, and a final extension step at 72°C for 5 minutes was 
performed, thereby carrying out the PCR. Biotinylated chains were recovered using 
streptavidin-coated magnetic beads and were directly sequenced. 

15 [0023] 

Based on the nucleotide sequences of the above-mentioned cDNA clones, a 
consensus SCA2 cDNA sequence with a length of 4351 bp excluding the poly A tail 
was determined (SEQ ID NO:l, Figs. 1-4, see Fig. 6). In SEQ ID NO: 1, the region 
from 43 52nt to 4367nt is the poly A tail, and the number of "A" is not restricted to 

2 0 that shown in SEQ ID NO: 1 . It was confirmed that the poly A tail exists at the 
same location in CI 9, B4 and 3R1 which were independent cDNA clones. 
[0024] 

Example 2 Measurement of CAG Repeat Units in Sample 

Numbers of CAG repeat units were determined by polyacrylamide gel 
2 5 electrophoresis analysis of PCR products obtained using the primer pair of F-l and 
R- 1 . PCR was performed in a total volume of 1 0 ^1 containing 1 0 mM Tris-HCl, 
pH 8.3, 50 mM KC1, 2.0 mM MgCl 2 , 1 .7 M AWrimethylglycine, 1 1 lKBq of [a- 



10 ^ 



32 P]dCTP (111 Tbq/mmol), 30 \iM dCTP, and 200 \xM each of dATP, dGTP and 
TTP, 0,25 |liM each of the two primers, 200 ng of genomic DNA and 1 .25 U of Taq 
DNA polymerase. After an initial 2-min denaturation at 95°C, PCR was performed 
for 32 cycles of 1-min denaturation at 95°C, 1-min annealing at 60°C and 1-min 
5 extension at 72°C, followed by a final extension at 72°C for 5 min. Sequence 
ladders obtained using the cloned genomic segments of the SCA2 gene, which 
contain various sizes of CAG repeats, were used as size markers. For normal 
alleles containing one or two CAA interruptions, the numbers of the CAA units were 
included in the CAG repeat size. For SCA2 alleles having expanded CAG region, 
1 0 the above-mentioned insert sequence immediately after the CAG region was not 
included in the size of the CAG region. 
[0025] 

By the above-described method, the numbers of the CAG repeat units of 
normal individuals (286 chromosomes) and 10 pedigrees of SCA2 patients (34 SCA2 
1 5 chromosomes) were determined. The results are shown in Fig. 7. In Fig. 7, open 
bars indicate the results of the normal genes and solid bars indicate the results of the 
SCA2 genes. 

[0026] 

As is apparent from Fig. 7, in all of the normal genes, the numbers of the 
2 0 CAG repeat units were not more than 24, while in all of the SCA2 genes, they were 
not less than 35. Thus, it was confirmed that the cDNA identified as described 
above is the cDNA of the causative gene of SCA2. 
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[0027] 
SEQUENCE LISTING 
SEQ ID NO: 1 
SEQUENCE LENGTH: 4367 
SEQUENCE TYPE: nucleic acid 
STRNDEDNESS: double 
TOPOLOGY: linear 
SEQUENCE DESCRIPTION 

TATCCGCACC TCCGCTCCCA CCCGGCGCCT CGGCGCGCCC GCCCTCCG ATG CGC TCA 57 

Met Arg Ser 
1 

GCG GCC GCA GCT CCT CGG AGT CCC GCG GTG GCC ACC GAG TCT CGC CGC 105 
Ala Ala Ala Ala Pro Arg Ser Pro Ala Val Ala Thr Glu Ser Arg Arg 

5 10 15 

TTC GCC GCA GCC AGG TGG CCC GGG TGG CGC TCG CTC CAG CGG CCG GCG 153 
Phe Ala Ala Ala Arg Trp Pro Gly Trp Arg Ser Leu Gin Arg Pro Ala 
20 25 30 35 

CGG CGG AGC GGG CGG GGC GGC GGT GGC GCG GCC CCG GGA CCG TAT CCC 201 
Arg Arg Ser Gly Arg Gly Gly Gly Gly Ala Ala Pro Gly Pro Tyr Pro 

40 45 50 

TCC GCC GCC CCT CCC CCG CCC GGC CCC GGC CCC CCT CCC TCC CGG CAG 249 
Ser Ala Ala Pro Pro Pro Pro Gly Pro Gly Pro Pro Pro Ser Arg Gin 

55 60 65 

AGC TCG CCT CCC TCC GCC TCA GAC TGT TTT GGT AGC AAC GGC AAC GGC 297 
Ser Ser Pro Pro Ser Ala Ser Asp Cys Phe Gly Ser Asn Gly Asn Gly 

70 75 80 

GGC GGC GCG TTT CGG CCC GGC TCC CGG CGG CTC CTT GGT CTC GGC GGG 345 
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Gly Gly Ala Phe Arg Pro Gly Ser Arg 

85 90 
CCT CCC CGC CCC TTC GTC GTC GTC CTT 
Pro Pro Arg Pro Phe Val Val Val Leu 
100 105 
GCC CCT CCG GCC GCG CCA ACC CGC GCC 
Ala Pro Pro Ala Ala Pro Thr Arg Ala 
120 

TCC CCG CCG CGT TCC GGC GTC TCC TTG 
Ser Pro Pro Arg Ser Gly Val Ser Leu 
135 140 
CCC CGC CCG GCG TGC GAG CCG GTG TAT 
Pro Arg Pro Ala Cys Glu Pro Val Tyr 

150 155 
AAG CCC CAG CAG CAG CAG CAG CAG CAG 
Lys Pro Gin Gin Gin Gin Gin Gin Gin 

165 170 
CAG CAG CAG CAG CAG CAG CAG CAG CAG 
Gin Gin Gin Gin Gin Gin Gin Gin Gin 
180 185 
GTC CGC AAG CCC GGC GGC AGC GGC CTT 
Val Arg Lys Pro Gly Gly Ser Gly Leu 
200 

CCT TCG CCG TCC TCG TCC TCG GTC TCC 
Pro Ser Pro Ser Ser Ser Ser Val Ser 
215 220 
TCC TCG GTG GTC GCG GCG ACC TCC GGC 
Ser Ser Val Val Ala Ala Thr Ser Gly 



Arg Leu Leu Gly Leu Gly Gly 
95 

CTC GCC 
Leu Ala 



CTC CCC 
Leu Pro 
110 
TCC CCG 
Ser Pro 
125 

GCG CGC 
Ala Arg 

GGG CCC 
Gly Pro 

CAG CAG 
Gin Gin 

CCG CCG 
Pro Pro 
190 
CTA GCG 
Leu Ala 
205 

TCG TCC 
Ser Ser 

GGC GGG 
Gly Gly 



AGC CCG GGC 393 
Ser Pro Gly 
115 

CTC GGC GCC CGT GCG 441 
Leu Gly Ala Arg Ala 
130 

CCG GCT CCC GGC TGT 489 
Pro Ala Pro Gly Cys 
145 

CTC ACC ATG TCG CTG 537 
Leu Thr Met Ser Leu 
160 

CAG CAG CAG CAG CAA 585 

Gin Gin Gin Gin Gin 

175 

CCC GCG GCT GCC AAT 633 
Pro Ala Ala Ala Asn 
195 

TCG CCC GCC GCC GCG 681 
Ser Pro Ala Ala Ala 
210 

TCG GCC ACG GCT CCC 729 
Ser Ala Thr Ala Pro 
225 

AGG CCC GGC CTG GGC 777 
Arg Pro Gly Leu Gly 




230 235 240 

AGA GGT CGA AAC AGT AAC AAA GGA CTG OCT CAG TCT ACG ATT TCT TTT 
Arg Gly Arg Asn Ser Asn Lys Gly Leu Pro Gin Ser Thr I le Ser Phe 

245 250 255 

GAT GGA ATC TAT GCA AAT ATG AGG ATG GTT GAT ATA CTT ACA TCA GTT 
Asp Gly Me Tyr Ala Asn Met Arg Met Val His lie Leu Thr Ser Val 
260 265 270 275 

GTT GGC TCC AAA TGT GAA GTA GAA GTG AAA AAT GGA GGT ATA TAT GAA 
Val Gly Ser Lys Cys Glu Val Gin Val Lys Asn Gly Gly lie Tyr Glu 

280 285 290 

GGA GTT TTT AAA ACT TAC AGT CCG AAG TGT GAT TTG GTA CTT GAT GCC 
Gly Val Phe Lys Thr Tyr Ser Pro Lys Cys Asp Leu Val Leu Asp Ala 

295 300 305 

GCA CAT GAG AAA AGT ACA GAA TCC AGT TCG GGG CCG AAA CGT GAA GAA 
Ala His Glu Lys Ser Thr Glu Ser Ser Ser Gly Pro Lys Arg Glu Glu 

310 315 320 

ATA ATG GAG AGT ATT TTG TTC AAA TGT TCA GAC TTT GTT GTG GTA CAG 
Me Met Glu Ser Me Leu Phe Lys Cys Ser Asp Phe Val Val Val Gin 

325 330 335 

TTT AAA GAT ATG GAC TCC AGT TAT GCA AAA AGA GAT GCT TTT ACT GAC 
Phe Lys Asp Met Asp Ser Ser Tyr Ala Lys Arg Asp Ala Phe Thr Asp 
340 345 350 355 

TCT GCT ATC AGT GCT AAA GTG AAT GGC GAA CAC AAA GAG AAG GAC CTG 
Ser Ala Me Ser Ala Lys Val Asn Gly Glu His Lys Glu Lys Asp Leu 

360 365 370 

GAG CCC TGG GAT GCA GGT GAA CTC ACA GCC AAT GAG GAA CTT GAG GCT 
Glu Pro Trp Asp Ala Gly Glu Leu Thr Ala Asn Glu Glu Leu Glu Ala 
375 380 385 




TTG GAA AAT GAC GTA TCT AAT GGA TGG GAT CCC AAT GAT ATG TTT CGA 1257 
Leu Glu Asn Asp Val Ser Asn Gly Trp Asp Pro Asn Asp Met Phe Arg 

390 395 400 

TAT AAT GAA GAA AAT TAT GGT GTA GTG TCT AGG TAT GAT AGO AGT TTA 1305 
Tyr Asn Glu Glu Asn Tyr Gly Val Val Ser Thr Tyr Asp Ser Ser Leu 

405 410 415 

TCT TCG TAT ACA GTG CCC TTA GAA AGA GAT AAC TCA GAA GAA TTT TTA 1353 
Ser Ser Tyr Thr Val Pro Leu Glu Arg Asp Asn Ser Glu Glu Phe Leu 
420 425 430 435 

AAA CGG GAA GCA AGG GCA AAC CAG TTA GCA GAA GAA ATT GAG TCA AGT 1401 
Lys Arg Glu Ala Arg Ala Asn Gin Leu Ala Glu Glu I le Glu Ser Ser 

440 445 450 

GCC CAG TAC AAA GCT CGA GTG GCC CTG GAA AAC GAT GAT AGG AGT GAG 1449 
Ala Gin Tyr Lys Ala Arg Val Ala Leu Glu Asn Asp Asp Arg Ser Glu 

455 460 465 

GAA GAA AAA TAC ACA GCA GTT CAG AGA AAT TCC AGT GAA CGT GAG GGG 1497 
Glu Glu Lys Tyr Thr Ala Val Gin Arg Asn Ser Ser Glu Arg Glu Gly 

470 475 480 

CAC AGC ATA AAC ACT AGG GAA AAT AAA TAT ATT CCT CCT GGA CAA AGA 1545 
His Ser Me Asn Thr Arg Glu Asn Lys Tyr Me Pro Pro Gly Gin Arg 

485 490 495 

AAT AGA GAA GTC ATA TCC TGG GGA AGT GGG AGA CAG AAT TCA CCG CGT 1593 
Asn Arg Glu Val Me Ser Trp Gly Ser Gly Arg Gin Asn Ser Pro Arg 
500 505 510 515 

ATG GGC CAG CCT GGA TCG GGC TCC ATG CCA TCA AGA TCC ACT TCT CAC 1641 
Met Gly Gin Pro Gly Ser Gly Ser Met Pro Ser Arg Ser Thr Ser His 

520 525 530 

ACT TCA GAT TTC AAC CCG AAT TCT GGT TCA GAC CAA AGA GTA GTT AAT 1689 




Thr Ser Asp Phe Asn Pro Asn Ser Gly Ser Asp Gin Arg Val Val Asn 

535 540 545 

GGA GGT GTT CCC TGG CCA TCG CCT TGC CCA TCT CCT TCC TCT CGC CCA 1737 
Gly Gly Val Pro Trp Pro Ser Pro Cys Pro Ser Pro Ser Ser Arg Pro 

550 555 560 

CCT TCT CGC TAC CAG TCA GGT CCC AAC TCT CTT CCA CCT CGG GCA GCC 1785 
Pro Ser Arg Tyr Gin Ser Gly Pro Asn Ser Leu Pro Pro Arg Ala Ala 

565 570 575 

ACC CCT ACA CGG CCG CCC TCC AGG CCC CCC TCG CGG CCA TCC AGA CCC 1833 
Thr Pro Thr Arg Pro Pro Ser Arg Pro Pro Ser Arg Pro Ser Arg Pro 
580 585 590 595 

CCG TCT CAC CCC TCT GCT CAT GGT TCT CCA GCT CCT GTC TCT ACT ATG 1881 
Pro Ser His Pro Ser Ala His Gly Ser Pro Ala Pro Val Ser Thr Met 

600 605 610 

CCT AAA CGC ATG TCT TCA GAA GGG CCT CCA AGG ATG TCC CCA AAG GCC 1929 
Pro Lys Arg Met Ser Ser Glu Gly Pro Pro Arg Met Ser Pro Lys Ala 

615 620 625 

CAG CGA CAT CCT CGA AAT CAC AGA GTT TCT GCT GGG AGG GGT TCC ATA 1977 
Gin Arg His Pro Arg Asn His Arg Val Ser Ala Gly Arg Gly Ser Me 

630 635 640 

TCC AGT GGC CTA GAA TTT GTA TCC CAC AAC CCA CCC AGT GAA GCA GCT 2025 
Ser Ser Gly Leu Glu Phe Val Ser His Asn Pro Pro Ser Glu Ala Ala 

645 650 655 

ACT CCT CCA GTA GCA AGG ACC AGT CCC TCG GGG GGA ACG TGG TCA TCA 2073 
Thr Pro Pro Val Ala Arg Thr Ser Pro Ser Gly Gly Thr Trp Ser Ser 
660 665 670 675 

GTG GTC AGT GGG GTT CCA AGA TTA TCC CCT AAA ACT CAT AGA CCC AGG 2121 
Val Val Ser Gly Val Pro Arg Leu Ser Pro Lys Thr His Arg Pro Arg 
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680 685 690 

TCT CGC AGA CAG AAC AGT ATT GGA AAT ACC CCC AGT GGG CCA GTT CTT 2169 
Ser Pro Arg Gin Asn Ser Me Gly Asn Thr Pro Ser Gly Pro Val Leu 

695 700 705 

GCT TCT CCC CAA GCT GGT ATT ATT CCA ACT GAA GCT GTT GCC ATG CCT 2217 
Ala Ser Pro Gin Ala Gly lie Me Pro Thr Glu Ala Val Ala Met Pro 

710 715 720 

ATT CCA GCT GCA TCT CCT ACG CCT GCT AGT CCT GCA TCG AAC AGA GCT 2265 
Me Pro Ala Ala Ser Pro Thr Pro Ala Ser Pro Ala Ser Asn Arg Ala 

725 730 735 

GTT ACC CCT TCT AGT GAG GCT AAA GAT TCC AGG CTT CAA GAT CAG AGG 2313 
Val Thr Pro Ser Ser Glu Ala Lys Asp Ser Arg Leu Gin Asp Gin Arg 
740 745 750 755 

CAG AAC TCT CCT GCA GGG AAT AAA GAA AAT ATT AAA CCC AAT GAA ACA 2361 
Gin Asn Ser Pro Ala Gly Asn Lys Glu Asn I le Lys Pro Asn Glu Thr 

760 765 770 

TCA CCT AGC TTC TCA AAA GCT GAA AAC AAA GGT ATA TCA CCA GTT GTT 2409 
Ser Pro Ser Phe Ser Lys Ala Glu Asn Lys Gly Me Ser Pro Val Val 

775 780 785 

TCT GAA CAT AGA AAA CAG ATT GAT GAT TTA AAG AAA TTT AAG AAT GAT 2457 
Ser Glu His Arg Lys Gin Me Asp Asp Leu Lys Lys Phe Lys Asn Asp 

790 795 800 

TTT AGG TTA CAG CCA AGT TCT ACT TCT GAA TCT ATG GAT CAA CTA CTA 2505 
Phe Arg Leu Gin Pro Ser Ser Thr Ser Glu Ser Met Asp Gin Leu Leu 

805 810 815 

AAC AAA AAT AGA GAG GGA GAA AAA TCA AGA GAT TTG ATC AAA GAC AAA 2553 
Asn Lys Asn Arg Glu Gly Glu Lys Ser Arg Asp Leu I le Lys Asp Lys 
820 825 830 835 
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ATT GAA CCA AGT GCT AAG GAT TCT TTC ATT GAA AAT AGC AGC AGC AAC 2601 
lie Glu Pro Ser Ala Lys Asp Ser Phe lie Glu Asn Ser Ser Ser Asn 

840 845 850 

TGT ACC AGT GGC AGC AGC AAG CCG AAT AGC CCC AGC ATT TCC CCT TCA 2649 
Cys Thr Ser Gly Ser Ser Lys Pro Asn Ser Pro Ser Me Ser Pro Ser 

855 860 865 

ATA CTT AGT AAC ACG GAG CAC AAG AGG GGA CCT GAG GTC ACT TCC CAA 2697 
Me Leu Ser Asn Thr Glu His Lys Arg Gly Pro Glu Val Thr Ser Gin 

870 875 880 

GGG GTT CAG ACT TCC AGC CCA GCA TGT AAA CAA GAG AAA GAC GAT AAG 2745 
Gly Val Gin Thr Ser Ser Pro Ala Cys Lys Gin Glu Lys Asp Asp Lys 

885 890 895 

GAA GAG AAG AAA GAC GCA GCT GAG CAA GTT AGG AAA TCA ACA TTG AAT 2793 
Glu Glu Lys Lys Asp Ala Ala Glu Gin Val Arg Lys Ser Thr Leu Asn 
900 905 910 915 

CCC AAT GCA AAG GAG TTC AAC CCA CGT TCC TTC TCT CAG CCA AAG CCT 2841 
Pro Asn Ala Lys Glu Phe Asn Pro Arg Ser Phe Ser Gin Pro Lys Pro 

920 925 930 

TCT ACT ACC CCA ACT TCA CCT CGG CCT CAA GCA CAA CCT AGC CCA TCT 2889 
Ser Thr Thr Pro Thr Ser Pro Arg Pro Gin Ala Gin Pro Ser Pro Ser 

935 940 945 

ATG GTG GGT CAT CAA CAG CCA ACT CCA GTT TAT ACT CAG CCT GTT TGT 2937 
Met Val Gly His Gin Gin Pro Thr Pro Val Tyr Thr Gin Pro Val Cys 

950 955 960 

TTT GCA CCA AAT ATG ATG TAT CCA GTC CCA GTG AGC CCA GGC GTG CAA 2985 
Phe Ala Pro Asn Met Met Tyr Pro Val Pro Val Ser Pro Gly Val Gin 

965 970 975 

CGT TTA TAC CCA ATA CCT ATG ACG CCC ATG CCA GTG AAT CAA GCC AAG 3033 



18 



Pro Leu Tyr Pro Me Pro Met Thr Pro Met Pro Val Asn Gin Ala Lys 

980 985 990 995 

ACA TAT AG A GCA GTA CCA AAT ATG CCC CAA CAG CGG CAA GAC CAG CAT 3081 

Thr Tyr Arg Ala Val Pro Asn Met Pro Gin Gin Arg Gin Asp Gin His 

1000 1005 1010 

CAT CAG AGT GCC ATG ATG CAC CCA GCG TCA GCA GCG GGC CCA CCG ATT 3129 
His Gin Ser Ala Met Met His Pro Ala Ser Ala Ala Gly Pro Pro I le 

1015 1020 1025 

GCA GCC ACC CCA CCA GCT TAC TCC ACG CAA TAT GTT GCC TAC AGT CCT 3177 
Ala Ala Thr Pro Pro Ala Tyr Ser Thr Gin Tyr Val Ala Tyr Ser Pro 

1030 1035 1040 

CAG CAG TTC CCA AAT CAG CCC CTT GTT CAG CAT GTG CCA CAT TAT CAG 3225 
Gin Gin Phe Pro Asn Gin Pro Leu Val Gin His Val Pro His Tyr Gin 

1045 1050 1055 

TCT CAG CAT CCT CAT GTC TAT AGT CCT GTA ATA CAG GGT AAT GCT AGA 3273 
Ser Gin His Pro His Val Tyr Ser Pro Val Me Gin Gly Asn Ala Arg 
1060 1065 1070 1075 

ATG ATG GCA CCA CCA ACA CAC GCC CAG CCT GGT TTA GTA TCT TCT TCA 3321 
Met Met Ala Pro Pro Thr His Ala Gin Pro Gly Leu Val Ser Ser Ser 

1080 1085 1090 

GCA ACT CAG TAC GGG GCT CAT GAG CAG ACG CAT GCG ATG TAT GCA TGT 3369 
Ala Thr Gin Tyr Gly Ala His Glu Gin Thr His Ala Met Tyr Ala Cys 

1095 1100 1105 

CCC AAA TTA CCA TAC AAC AAG GAG ACA AGC CCT TCT TTC TAC TTT GCC 3417 
Pro Lys Leu Pro Tyr Asn Lys Glu Thr Ser Pro Ser Phe Tyr Phe Ala 

1110 1115 1120 

ATT TCC ACG GGC TCC CTT GCT CAG CAG TAT GCG CAC CCT AAC GCT ACC 3465 
Me Ser Thr Gly Ser Leu Ala Gin Gin Tyr Ala His Pro Asn Ala Thr 



19 



1125 1130 1135 

CTG CAC CCA CAT ACT CCA CAC CCT CA6 CCT TCA GCT ACC CCC ACT GGA 3513 
Leu His Pro His Thr Pro His Pro Gin Pro Ser Ala Thr Pro Thr Gly 
1140 1145 1150 1155 

CAG CAG CAA AGC CAA CAT GGT GGA AGT CAT CCT GCA CCC AGT CCT GTT 3561 
Gin Gin Gin Ser Gin His Gly Gly Ser His Pro Ala Pro Ser Pro Val 

1160 1165 1170 

CAG CAC CAT CAG CAC CAG GCC GCC CAG GCT CTC CAT CTG GCC AGT CCA 3609 
Gin His His Gin His Gin Ala Ala Gin Ala Leu His Leu Ala Ser Pro 

1175 1180 1185 

CAG CAG CAG TCA GCC ATT TAC CAC GCG GGG CTT GCG CCA ACT CCA CCC 3657 
Gin Gin Gin Ser Ala Me Tyr His Ala Gly Leu Ala Pro Thr Pro Pro 

1190 1195 1200 

TCC ATG ACA CCT GCC TCC AAC ACG CAG TCG CCA CAG AAT AGT TTC CCA 3705 
Ser Met Thr Pro Ala Ser Asn Thr Gin Ser Pro Gin Asn Ser Phe Pro 

1205 1210 1215 

GCA GCA CAA CAG ACT GTC TTT ACG ATC CAT CCT TCT CAC GTT CAG CCG 3753 
Ala Ala Gin Gin Thr Val Phe Thr Me His Pro Ser His Val Gin Pro 
1220 1225 1230 1235 

GCG TAT ACC AAC CCA CCC CAC ATG GCC CAC GTA CCT CAG GCT CAT GTA 3801 
Ala Tyr Thr Asn Pro Pro His Met Ala His Val Pro Gin Ala His Val 

1240 1245 1250 

CAG TCA GGA ATG GTT CCT TCT CAT CCA ACT GCC CAT GCG CCA ATG ATG 3849 
Gin Ser Gly Met Val Pro Ser His Pro Thr Ala His Ala Pro Met Met 

1255 1260 1265 

CTA ATG ACG ACA CAG CCA CCC GGC GGT CCC CAG GCC GCC CTC GCT CAA 3897 
Leu Met Thr Thr Gin Pro Pro Gly Gly Pro Gin Ala Ala Leu Ala Gin 
1270 1275 1280 
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AGT GCA CTA CAG CCC ATT CCA GTC TCG ACA ACA GCG CAT TTC CCC TAT 3945 
Ser Ala Leu Gin Pro Me Pro Val Ser Thr Thr Ala His Phe Pro Tyr 

1285 1290 1295 

ATG ACG CAC CCT TCA GTA CAA GCC CAC CAC CAA CAG CAG TTG 3987 
Met Thr His Pro Ser Val Gin Ala His His Gin Gin Gin Leu 
1300 1305 1310 

TAAGGCTGCC CTGGAGGAAC CGAAAGGCCA AATTCCCTCC TCCCTTCTAC TGCTTCTACC 4047 
AACTGGAAGC ACAGAAAACT AGAATTTCAT TTATTTTGTT TTTAAAATAT ATATGTTGAT 4107 
TTCTTGTAAC ATCCAATAGG AATGCTAACA GTTCACTTGC AGTGGAAGAT ACTTGGACCG 4167 
AGTAGAGGCA TTTAGGAACT TGGGGGCTAT TCCATAATTC CATATGCTGT TTCAGAGTCC 4227 
CGCAGGTACC CCAGCTCTGC TTGCCGAAAC TGGAAGTTAT TTATTTTTTA ATAACCCTTG 4287 
AAAGTCATGA ACACATCAGC TAGCAAAAGA AGTAACAAGA GTGATTCTTG CTGCTATTAC 4347 
TGCTAAAAAA AAAAAAAAAA 4367 
[BRIEF DESCRIPTION OF THE DRAWINGS] 

[Fig. 1] 

A drawing which shows the nucleotide sequence of the cDNA fragment 
according to the present invention together with the amino acid sequence encoded 
thereby, which nucleotide sequence was determined in the Examples of the present 
invention. 

[Fig. 2] 

A drawing which shows the continuation of Fig. 1 . 
[Fig. 3] 

A drawing which shows the continuation of Fig. 2. 
[Fig. 4] 

A drawing which shows the continuation of Fig. 3. 
[Fig. 5] 

A pedigree chart of the SCA2 patients who donated the genomic DNAs used in 



21 



the Examples. 
[Fig. 6] 

A drawing which shows the sizes, positions and restriction sites of the genomic 
DNA fragments Tspl and Tsp2, and SCA2 cDNA obtained in the Examples of the 
present invention, as well as the size and position of each of the obtained cDNA 
fragment. 

[Fig. 7] 

A drawing which shows the distribution of the numbers of the CAG repeat units 
in the normal and SCA2 genes, which were measured by using the (CAG) 55 probe. 
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1 TATCCGCACCTCCGCTCCCACCCGGCGCCTCGGCGCGCCCGCCCTCCGATGCGCTCAGCG 
1 F-1006 M R S A 

61 GCCGCAGCTCCTCGGAGTCCCGCGGTGGCCACCGAGTCTCGCCGCTTCGCCGCAGCCAGG 
5 A A A PR S PA VATESRRFAAAR 
121 TGGCCCGGGTGGCGCTCGCTCCAGCGGCCGGCGCGGCGGAGCGGGCGGGGCGGCGGTGGC 

25 WPGWRSLQRPARRSGRGGGG 
181 GCGGCCCCGGGACCGTATCCCTCCGCCGCCCCTCCCCCGCCCGGCCCCGGCCCCCCTCCC 
45AAPGPYPSAAPPPPG PGPPP 

2 41 TCCCGGC AGAGCTCGCCTCCCTCCGCCTCAGACTGTTTTGGTAGC AACGGC AACGGCGGC 
65 5J?QSSPPSAS.DCFGSNGNGG 

3 01 GGCGCGTTTCGGCCCGGCTCCCGGCGGCTCCTTGGTCTCGGCGGGCCTCCCCGCCCCTTC 

R-1002 

85GAFRPGSRRLL GLGGPPRPF 

3 61 GTCGTCGTCCTTCTCCCCCTCGCCAGCCCGGGCGCCCCTCCGGCCGCGCCAACCCGCGCC 

105 VVVLLP. LAS P G A P P A A P T R A 

421 TCCCCGCTCGGCGCCCGTGCGTCCCCGCCGCGTTCCGGCGTCTCCTTGGCGCGCCCGGCT 

125 S PLGARAS PPRSGVSLAR PA 

481 CCCGGCTGTCCCCGCCCGGCGTGCGAGCCGGTGTATGGGCCCCTCACCATGTCGCTGAAG 

F-l- 

145 PGCPRPACEPVYGPLT MSLK 

541 ^CCCCAGCAGCAGC AGCAGC AGCAGCAGC AGCAGC AGC AGC AGC AAC AGC AGC AGC AGC AG 

165PQQQQQ QQQQQQQQQQQQQQ 

6 01 CAGCAGCAGCAGCCGCCGCCCGCGGCTGCCAATGTCCGCAAGCCCGGCGGCAGCGGCCTT 

M ■ R— 2 * 

185 QQQQPPPA AANVRKPGG SGL 

661 CTAGCGTCGCCCGCCGCCGCGCCTTCGCCGTCCTCGTCCTCGGTCTCCTCGTCCTCGGCC 
R-l 

205 LAS P A A A P S PS S S SVS S S S A . 
721 ACGGCTCCCTCCTCGGTGGTCGCGGCGACCTCCGGCGGCGGGAGGCCCGGCCTGGGCAGA 
225 TAP S S VVAATSGGGRP GL.G R 

7 81 GGTCGAAACAGTAACAAAGGACTGCCTCAGTCTACGATTTCTTTTGATGGAATCTATGCA 
245 G RNSNKGL.PQST I S F D G I YA 
841 AATATGAGGATGGTTCATATACTTACATCAGTTGTTGGCTCCAAATGTGAAGTACAAGTG 
265 NMRMVHI L T SVVGSKCE VQV 
901 AAAAATGGAGGTATATATGAAGGAGTTTTTAAAACTTACAGTCCGAAGTGTGATTTGGTA 
285 KNGGIYEGVFKTYSPKCDLV 
961 CTTGATGCCGCACATGAGAAAAGTACAGAATCCAGTTCGGGGCCGAAACGTGAAGAAATA 
305 L DAAHE K S TE S S SG P KREE I 

1021 ATGGAGAGTATTTTGTTC AAATGTTCAGACTTTGTTGTGGTACAGTTTAAAGATATGG AC 

325 MESI LFKCSD FV VVQFKDMD 
1081 TCCAGTTATGCAAAAAGAGATGCTTTTAGTGACTCTGCTATCAGTGCTAAAGTGAATGGC 

345 SSYAKRDAFT DSAI SAKVNG 
1141 GAAC ACAAAGAGAAGGACCTGGAGCCCTGGGATGCAGGTGAACTC AC AGCCAATGAGGAA 

365 EHKEKDLEPWDAGELTANEE 
12 01 CTTGAGGCTTTGGAAAATGACGTATCTAATGGATGGGATCCCAATGATATGTTTCGATAT 

385 LEALENDVSNGWDPNDMFRY 
1261 AATGAAGAAAATTATGGTGTAGTGTCTACGTATGATAGC AGTTTATCTTCGTATAC AGTG 

405 NEENYGVVSTYDSSLSSYTV 
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13 21 C C C TT AG AAAG AG AT AAC TC AG AAGAATTTTT AAAAC GGG AAGC AAGGG C AAAC C AG TTA 

425 PLER. DNSEEFLKREARANQL 
13 81 GC AGAAGAAATTGAGTC AAGTGCCCAGTAC AAAGCTCGAGTGGCCCTGG AAAACG ATGAT 

445 AEEIES SAQYKARVALENDD 
1441 AGGAGTGAGGAAGAAAAATACACAGCAGTTCAGAGAAATTC CAGTGAACGTGAGGGGC AC 

465 RSEEEKYTAVQRNS SEREGH 
15 01 AGCATAAACACTAGGGAAAATAAATATATTCCTCCTGGACAAAGAAATAGAGAAGTCATA 

485 SINTRENKYI PPGQRNREVI 
1561 TCCTGGGGAAGTGGGAGACAGAATTCACCGCGTATGGGCC AGCCTGGATCGGGCTCC ATG 

505 SWGSGRQN SPRMGQPGSGSM 
1621 CCATCAAGATCCACTTCTCACACTTC AGATTTCAACCCGAATTCTGGTTCAGACC AAAGA 

525 PSRSTSHTSDFNPNSGSDQR 
1681 GTAGTTAATGGAGGTGTTCCCTGGCCATCGCCTTGCCCATCTCCTTCCTCTCGCCCACCT 

545 VVNGGVFWPS PCPSPSSRPP 
1741 T C TC GC TAC C AGTC AGGTC C C AAC TC TC T TC C AC C TC GGGC AGCC AC C C C T AC AC GGC C G 

565 SRYQSGPNSL PPRAATPTRP 
1801 CCCTCCAGGCCCCCCTCGCGGCCATCCAGACCCCCGTCTCACCCCTCTGCTCATGGTTCT 

585 PSRPPSRPSRPPSHPSAHGS 
1861 CCAGCTCCTGTCTCTACTATGCCTAAACGCATGTCTTCAGAAGGGCCTCC AAGGATGTCC 

605 PAPVSTMPK RMSSEGP PRMS 
1921 CCAAAGGCCCAGCGACATCCTCGAAATCACAGAGTTTCTGCTGGGAGGGGTTCCATATCC 

62 5PK AQ RHPRNH RVSAGR GS I S 
1981 AGTGGCCTAGAATTTGTATCCCACAACCCACCCAGTGAAGCAGCTACTCCTCC AGTAGC A 

645 SGLEFVSHNPPSEAATPPVA 
2 041 AGGACCAGTCCCTCGGGGGGAACGTGGTCATCAGTGGTCAGTGGGGTTCCAAGATTATCC 

665 RTSPSGGTWSSVVSGVPRLS 
2101 CCTAAAAC TC ATAGACC C AGGTCTCCCAGAC AGAAC AGTATTGGAAATACC CC C AGTGGG 

685 PKTHRPRSPR QNSIGKTPSG 
2161 CCAGTTCTTGCTTCTCCCCAAGCTGGTATTATTCCAACTGAAGCTGTTGCCATGCCTATT 

705 PVLAS PQAGI I P TE A VAMP I 
22 21 CCAGCTGCATCTCCTACGCCTGCTAGTCCTGCATCGAAC AGAGCTGTTACCCCTTCTAGT 

725 PAASPTPASPASNRAVTPS S 
22 81 GAGGCTAAAGATTCCAGGCTTC AAGATC AGAGGCAGAACTCTCCTGCAGGGAATAAAGAA 

745 EAKDSRLQDQRQNSPAGNKE 
2341 AATATTAAACCCAATGAAACATCACCTAGCTTCTCAAAAGCTGAAAACAAAGGTATATCA 

765 NIKPNETSPSFSKAENKGIS 
2401 CC AGTTGTTTC TGAAC AT AGAAAAC AGATTGATGATTT AAAGAAATTT AAG AATG ATTTT 

785 PVVSEHRKQI DDLKKFKNDF 
2461 AGGTTAC AGCC AAGTTCTACTTCTGAATCTATGGATCAACTACTAAACAAAAATAGAGAG 

805 RLQPSSTSESMD-QLLNKNRE 
2521 GGAGAAAAATCAAGAGATTTGATC AAAGAC AAAATTGAACC AAGTGCTAAGGATTCTTTC 

825 G EKSRDLIKD KIEPSAKDSF 
2 581 ATTGAAAATAGCAGC AGC AACTGTACCAGTGGCAGCAGCAAGCCGAATAGCCCC AGC ATT 

845 IENSSSNCTSGSSKPNSPSI 
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2 641 TCCCCTTCAATACTTAGTAACACGGAGCACAAGAGGGGACCTGAGGTCACTTCCCAAGGG 

865 SPSILSNTEHKRGPEVTSQG 

27 01 GTTCAGACTTCCAGCCCAGCATGTAAAC AAGAGAAAGACGATAAGGAAGAGAAGAAAGAC 

885 VQTSSPACKQEKDDKEEKKD 

27 61 GCAGCTGAGC AAGTTAGGAAATCAACATTGAATCCC AATGC AAAGGAGTTCAACCCACGT 

905 AAEQVRKSTLNPNAKEFNPR 

2 821 TCCTTCTCTCAGCCAAAGCCTTCTACTACCCCAACTTCACCTCGGCCTCAAGCACAACCT 

r-13 

925 SFSQPKPSTTPTSPRPQAQP 

2 8 81 AGCCCATCTATGGTGGGTCATCAACAGCC AACTCC AGTTTATACTC AGCCTGTTTGTTTT 
945 SPSMVGHQQ PTPVYTQPVCF 

2941 GCACCAAATATGATGTATCCAGTCCC AGTGAGCCCAGGCGTGCAACCTTTATACCCAATA 
965 APNMMYPVPVSPGVQPLYPI 

3 001 C C TATGAC GC C C ATGC C AGTG AATC AAGC C AAG AC ATATAG AGC AG T AC C AAAT ATGC C C 
985 PMTPMPVNQAKTYRAV PNMP 

3 0 61 CAACAGCGGCAAGACCAGCATCATCAGAGTGCC ATGATGCACCCAGCGTCAGC AGCGGGC 
1005 QQRQDQHHQSAMMHPASAA G 
3121 CCACCGATTGCAGCCACCCCACCAGCTTACTCCACGCAATATGTTGCCTACAGTCCTCAG 
1025 PPIAATPPAYSTQYVAY SPQ 
3181 CAGTTCCCAAATCAGCCCCTTGTTCAGCATGTGCCACATTATCAGTCTCAGCATCCTCAT 
1045 QFPNQPLVQHVPHYQSQHPH 
32 41 GTCTATAGTCCTGTAATACAGGGTAATGCTAGAATGATGGC ACCACCAACACACGCCCAG 
1065 VY S PV I.QGNARMMA P PTHAQ 
3 3 01 CCTGGTTTAGTATCTTCTTCAGCAACTC AGTACGGGGCTC ATG AGC AGACGC ATGC GATG 
1085 PGLiVS S SAT Q YGAHEQ THAM 
3361 TATGC ATGTCCCAAATTACCATACAACAAGGAGACAAGCCCTTCTTTCTACTTTGCCATT 
1105 YACPKLPYNKETS PSFYFAI 
3 421 TC C ACGGGC TCC C TTGC TC AGC AGT ATGC GC AC C CT AACGCTAC C C TGC AC C C AC AT ACT 
1125 S TG S LAQQYAH PNA TLH P HT 
3481 CCACACCCTC AGCCTTCAGCTACCCCCACTGGACAGC AGCAAAGCCAACATGGTGGAAGT 
1145 PHPQPSATPTGQQQ SQHGGS 
3 541 CATCCTGCACCCAGTCCTGTTCAGCACCATCAGCACCAGGCCGCCCAGGCTCTCCATCTG 
1165 HPAPSPVQHHQHQAAQALHli 
3601 GCCAGTCCACAGCAGCAGTCAGCCATTTACC ACGCGGGGCTTGCGCCAACTCC ACCCTCC 
1185 ASPQQQSAIYHAGL AP TPPS 
3 661 ATGACACCTGCCTCCAACACGCAGTCGCCACAGAATAGTTTCCCAGCAGC ACAACAGACT 
1205 MTPASNTQSPQNSF PAAQQT 
3721 GTCTTTACGATCCATCCTTCTCACGTTCAGCCGGCGTATACC AACCCACCCCAC ATGGCC 
1225 VFTXHPSHVQPAYTNPPHMA 
37 81 C ACGT ACC TC AGGC TC ATGTAC AGTC AGG AATGGTTC C TTC TC AT C C AAC TGC C C ATGC G 
1245 HVPQAHVQS GM V P S H PTAHA 
3 841 CC AATGATGCTAATGACGAC ACAGCCACCCGGCGGTCCCCAGGCCGCCCTCGCTCAAAGT 
1265 PMM LMTTQ PPGGPQAALAQS 
39 01 GCACTACAGCCCATTCCAGTCTCGACAACAGCGCATTTCCCCTATATGACGCACCCTTCA 
1285 ALQPI PVSTTAHFPYMTHPS 
3 961 GTACAAGCCCACCACCAACAGCAGTTGTAAGGCTGCCCTGGAGGAACCGAAAGGCCAAAT 
1305 VQAHHQQQL * 
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4021 TCCCTCCTCCCTTCTACTGCTTCTACCAACTGGAAGCACAGAAAACTAGAATTTCATTTA 
4 081 TTTTGTTTTTAAAATATATATGTTGATTTCTTGTAACATCCAATAGGAATGCTAACAGTT 
4141 CACTTGCAGTGGAAGATACTTGGACCGAGTAGAGGCATTTAGGAACTTGGGGGCTATTCC 
42 01 ATAATTCCATATGCTGTTTCAGAGTCCCGC AGGTACCCC AGCTCTGCTTGCCGAAACTGG 
42 61 AAGTTATTTATTTTTTAATAACCCTTGAAAGTCATGAACACATCAGCTAGCAAAAGAAGT 
4321 AACAAGAGTGATTCTTGCTGCTATTACTGCT(A) n 
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